{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!guardrails hub install hub://guardrails/similar_to_previous_values"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Check whether a value is similar to a set of other values\n",
    "\n",
    "**Using the SimilarToPreviousValues validator**\n",
    "\n",
    "This validator validates whether a new value is similar to a set of previously known values. It is useful for checking whether a new value is within a distribution of known values. It supports both integer and string values.\n",
    "\n",
    "For integer values, this validator checks whether the value lies within the specified standard deviations of the mean of the previous values. (Assumes that the previous values are normally distributed.)\n",
    "\n",
    "For string values, this validator checks whether the average semantic similarity between the generated value and the previous values is less than a threshold.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [],
   "source": [
    "import guardrails as gd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create the Guard with the SimilarToList validator\n",
    "from guardrails.hub import SimilarToPreviousValues\n",
    "\n",
    "guard = gd.Guard.from_string(\n",
    "    validators=[SimilarToPreviousValues(standard_deviations=2, threshold=0.2, on_fail=\"fix\")],\n",
    "    description=\"testmeout\",\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Test with integer values\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "3\n"
     ]
    }
   ],
   "source": [
    "# Test with a value that is within the distribution\n",
    "output = guard.parse(\n",
    "    llm_output='3',\n",
    "    metadata={\"prev_values\": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]},\n",
    ")\n",
    "\n",
    "print(output.validated_output)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As 3 is within the distribution of the given prev_values, the validator returns 3 as it is.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "None\n"
     ]
    }
   ],
   "source": [
    "# Test with a value that is outside the distribution\n",
    "output = guard.parse(\n",
    "    llm_output='300',\n",
    "    metadata={\"prev_values\": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]},\n",
    ")\n",
    "\n",
    "print(output.validated_output)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As 300 is not within the distribution of the given prev_values, the validator returns None.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Test with string values\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Define embed function\n",
    "# Create an embedding function that uses openAI Ada to embed the text.\n",
    "import numpy as np\n",
    "import openai\n",
    "\n",
    "# Define the embed function\n",
    "def embed_function(text: str) -> np.ndarray:\n",
    "    \"\"\"Embed the text using openAI Ada.\"\"\"\n",
    "    response = openai.embeddings.create(\n",
    "        model=\"text-embedding-ada-002\",\n",
    "        input=text,\n",
    "    )\n",
    "    embeddings = response.data[0].embedding\n",
    "    # response[\"data\"][0][\"embedding\"]\n",
    "    return np.array(embeddings)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n",
      "HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n",
      "HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n",
      "HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "cisco\n"
     ]
    }
   ],
   "source": [
    "# Test with a value that is similar to prev values\n",
    "# As it is, it will return that value\n",
    "output = guard.parse(\n",
    "    llm_output=\"cisco\",\n",
    "    metadata={\n",
    "        \"prev_values\": [\"broadcom\", \"paypal\"],\n",
    "        \"embed_function\": embed_function,\n",
    "    },\n",
    ")\n",
    "\n",
    "print(output.validated_output)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n",
      "HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n",
      "HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n",
      "HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "None\n"
     ]
    }
   ],
   "source": [
    "# Test with a value that is not similar to prev values\n",
    "# As it is not, it will return None\n",
    "output = guard.parse(\n",
    "    llm_output=\"taj mahal\",\n",
    "    metadata={\n",
    "        \"prev_values\": [\"broadcom\", \"paypal\"],\n",
    "        \"embed_function\": embed_function,\n",
    "    },\n",
    ")\n",
    "\n",
    "print(output.validated_output)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.6"
  },
  "orig_nbformat": 4
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
